Unification-Based Glossing

نویسندگان

  • Vasileios Hatzivassiloglou
  • Kevin Knight
چکیده

We present an approach to syntax-based machine translation that combines uniication-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of uniication to compose and manipulate lattices. Uniication also allows us to specify abstract features that delay target-language synthesis until enough source-language information is assembled. Our statistical component enables us to search eeciently among competing translations and locate those with high English uency. 1 Background JAPANGLOSS Knight et al., 1994; 1995] is a project whose goals are to scale up knowledge-based machine translation (KBMT) techniques to handle Japanese-English newspaper MT, to achieve higher quality output than is currently available, and to develop techniques for rapidly constructing MT systems. We built the rst version of JAPANGLOSS in nine months and recently participated in an ARPA evaluation of MT quality White and O'Connell, 1994]. JAPANGLOSS is an eeort within the larger PANGLOSS NMSU/CRL et al., 1995] MT project. Our approach is to use a KBMT framework, but to fall back on statistical methods when knowledge gaps arise (as they inevitably will). We syntactically analyze Japanese text, map it to a semantic representation, then generate English. Figure 1 shows a sample translation. Parsing is bottom-up, driven by an augmented context-free grammar whose format is roughly like that of Shieber, 1986]. Our grammar rules look like this: OUTPUT: The new company plans to establish in February. The semantic representation contains conceptual tokens drawn from the 70,000-term SENSUS ontology Knight and Luk, 1994]. Semantic analysis proceeds as a bottom-up walk of the parse tree, in the style of Mon-tague and Moore Dowty et al., 1981; Moore, 1989]. Semantics is compositional, with each parse tree node assigned a meaning based on the meanings of its children. Leaf node meanings are retrieved from a semantic lexicon , while meaning composition rules handle internal nodes. Semantic rules and lexical entries are sensitive to syntactic structure,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Ubiquity of the Gloss

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

The Effects of Oral Code-mixing and Glossing on Iranian EFL Learners' Vocabulary Knowledge

The current study investigated the effects of oral code-mixing and glossing on L2 vocabulary learning. To this end, 60 EFL learners studying at pre-university school were given a pre-test to make sure that they did not have any prior knowledge of the target words. Based on their scores in the pre-test, 36 pre-university students were selected and divided into three groups, including two experim...

متن کامل

Automatic interlinear glossing as two-level sequence classification

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...

متن کامل

05John Whitman_OK.indd

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning

In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...

متن کامل

Word-for-Word Glossing with Contextually Similar Words

Many corpus-based machine translation systems require parallel corpora. In this paper, we present a word-for-word glossing algorithm that requires only a source language corpus. To gloss a word, we first identify its similar words that occurred in the same context in a large corpus. We then determine the gloss by maximizing the similarity between the set of contextually similar words and the di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995